Classifying Unseen Cases with Many Missing Values

نویسندگان

  • Zijian Zheng
  • Boon Toh Low
چکیده

Handling missing attribute values is an important issue for classiier learning, since missing attribute values in either training data or test (unseen) data aaect the prediction accuracy of learned classiiers. In many real KDD applications, attributes with missing values are very common. This paper studies the robust-ness of four recently developed committee learning techniques, including Boosting, Bagging, Sasc, and SascMB, relative to C4.5 for tolerating missing values in test data. Boosting is found to have a similar level of robustness to C4.5 for tolerating missing values in test data in terms of average error in a representative collection of natural domains under investigation. Bagging performs slightly better than Boosting, while Sasc and SascMB perform better than them in this regard, with SascMB performing best. Furthermore, we propose a novel voting weight scheme for the committee learning techniques. Although it is very simple, it can improve the robustness of all these four committee learning techniques for tolerating missing values in test data, especially when many missing values exist.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classifying Unseen Cases with Many

Handling missing attribute values is an important issue for classiier learning, since missing attribute values in either training data or test (unseen) data aaect the prediction accuracy of learned classi-ers. In many real KDD applications, attributes with missing values are very common. This paper studies the robustness of four recently developed committee learning techniques, including Boosti...

متن کامل

Using Association Rules to Make Rule-based Classifiers Robust

Rule-based classification systems have been widely used in real world applications because of the easy interpretability of rules. Many traditional rule-based classifiers prefer small rule sets to large rule sets, but small classifiers are sensitive to the missing values in unseen test data. In this paper, we present a larger classifier that is less sensitive to the missing values in unseen test...

متن کامل

A Critique of the View Claiming Conflict in the Verses of the Knowledge of the Unseen

The claim of conflict in the verses of the knowledge of the unseen in Quran is one of those made by Brasher – the Jewish orientalist. He believes that the verses which consider the knowledge of the unseen to be only specific to God are in conflict with those verses referring apparently to the Prophet (p.b.u.h) and some of the divine selected people's awareness of the unseen. Classifying the ver...

متن کامل

Generalization to Unseen Cases

We analyze classification error on unseen cases, i.e. cases that are different from those in the training set. Unlike standard generalization error, this off-training-set error may differ significantly from the empirical error with high probability even with large sample sizes. We derive a datadependent bound on the difference between off-training-set and standard generalization error. Our resu...

متن کامل

Replace Missing Values with EM algorithm based on GMM and Naïve Bayesian

In data mining applications, there are various kinds of missing values in experimental datasets. Non-substitution or inappropriate treatment of missing values has a high probability to cause a lot of warnings or errors. Besides, many classification algorithms are very sensitive to the missing values. Because of these, handling the missing values is an important phase in many classification or d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999